ATOM Documentation

← Back to App

Database Query Standard

Overview

This document defines the standard patterns for database queries in the ATOM SaaS backend. Consistent query patterns improve performance, maintainability, and reduce bugs.

Decision Framework

When to Use ORM (SQLAlchemy)

✅ **Use ORM for:**

  • Single table queries (CRUD operations)
  • Simple relationships (joins with foreign keys)
  • Business logic that benefits from object mapping
  • Queries that benefit from SQLAlchemy's session management
  • Read operations that need relationship loading

**Examples:**

# ✅ GOOD: Simple CRUD
agent = db.query(AgentRegistry).filter(
    AgentRegistry.id == agent_id,
    AgentRegistry.tenant_id == tenant_id
).first()

# ✅ GOOD: Relationship loading
proposal = db.query(AgentProposal).options(
    joinedload(AgentProposal.agent)
).filter(AgentProposal.id == proposal_id).first()

# ✅ GOOD: Create/Update
new_agent = AgentRegistry(
    id=str(uuid.uuid4()),
    tenant_id=tenant_id,
    name=name,
    role=role
)
db.add(new_agent)
db.commit()

When to Use Raw SQL

✅ **Use Raw SQL for:**

  • Complex aggregations (GROUP BY, window functions)
  • Performance-critical queries
  • Complex joins across many tables
  • Queries where ORM overhead is significant
  • Bulk operations
  • Analytics and reporting

**Examples:**

# ✅ GOOD: Complex aggregation
result = await db.execute(text("""
    SELECT
        agent_id,
        COUNT(*) as total_executions,
        AVG(execution_time) as avg_time,
        PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY execution_time) as median_time
    FROM agent_executions
    WHERE tenant_id = :tenant_id
    GROUP BY agent_id
"""), {"tenant_id": tenant_id})

# ✅ GOOD: Performance-critical query
agents = await db.execute(text("""
    SELECT id, name, role, status
    FROM agent_registry
    WHERE tenant_id = :tenant_id
      AND enabled = true
      AND status = :status
    ORDER BY confidence_score DESC
    LIMIT :limit
"""), {"tenant_id": tenant_id, "status": "autonomous", "limit": 10})

Standard Patterns

1. Tenant Isolation (CRITICAL)

**ALWAYS filter by tenant_id in queries:**

# ✅ GOOD: Explicit tenant filtering
result = await db.execute(
    select(AgentRegistry).where(
        AgentRegistry.tenant_id == tenant_id
    )
)

# ❌ BAD: Missing tenant filter
result = await db.execute(
    select(AgentRegistry)
)

2. Async Queries

**Use async/await for database operations:**

# ✅ GOOD: Async query
result = await db.execute(
    select(AgentRegistry).where(AgentRegistry.id == agent_id)
)
agent = result.scalar_one_or_none()

# ❌ BAD: Sync query in async context
agent = db.query(AgentRegistry).filter(AgentRegistry.id == agent_id).first()

3. Query Helpers

**Use helper functions for common patterns:**

# ✅ GOOD: Using helper
agent = await get_agent_by_id(db, agent_id, tenant_id)

# Instead of:
result = await db.execute(
    select(AgentRegistry).where(
        AgentRegistry.id == agent_id,
        AgentRegistry.tenant_id == tenant_id
    )
)
agent = result.scalar_one_or_none()

4. Error Handling

**Handle missing results gracefully:**

# ✅ GOOD: Handle missing data
result = await db.execute(
    select(AgentRegistry).where(AgentRegistry.id == agent_id)
)
agent = result.scalar_one_or_none()

if not agent:
    raise ValueError(f"Agent {agent_id} not found")

# ❌ BAD: Crash on missing data
agent = result.scalar_one()  # Raises NoResultFound

5. Bulk Operations

**Use bulk operations for performance:**

# ✅ GOOD: Bulk insert
await db.execute(
    insert(AgentProposal).values([
        {
            'id': str(uuid.uuid4()),
            'agent_id': agent_id,
            'tenant_id': tenant_id,
            # ... other fields
        }
        for _ in range(100)
    ])
)
await db.commit()

# ❌ BAD: Loop insert
for i in range(100):
    proposal = AgentProposal(...)
    db.add(proposal)
await db.commit()  # Very slow

Query Helpers

Common Query Patterns

from typing import Optional, List
from sqlalchemy import select, and_
from sqlalchemy.ext.asyncio import AsyncSession

async def get_by_id(
    db: AsyncSession,
    model,
    id: str,
    tenant_id: str
):
    """Get a record by ID with tenant isolation."""
    result = await db.execute(
        select(model).where(
            and_(
                model.id == id,
                model.tenant_id == tenant_id
            )
        )
    )
    return result.scalar_one_or_none()

async def list_with_filters(
    db: AsyncSession,
    model,
    tenant_id: str,
    filters: dict = None,
    limit: int = 50,
    offset: int = 0
) -> List:
    """List records with filters and pagination."""
    query = select(model).where(model.tenant_id == tenant_id)

    if filters:
        for key, value in filters.items():
            if hasattr(model, key):
                query = query.where(getattr(model, key) == value)

    query = query.limit(limit).offset(offset)

    result = await db.execute(query)
    return result.scalars().all()

async def exists(
    db: AsyncSession,
    model,
    tenant_id: str,
    **filters
) -> bool:
    """Check if a record exists."""
    conditions = [model.tenant_id == tenant_id]

    for key, value in filters.items():
        if hasattr(model, key):
            conditions.append(getattr(model, key) == value)

    result = await db.execute(
        select(model.id).where(and_(*conditions)).limit(1)
    )
    return result.scalar_one_or_none() is not None

Performance Guidelines

1. Use Indexes

**Create indexes for frequently queried columns:**

# In your model
class AgentRegistry(Base):
    __tablename__ = "agent_registry"

    id = Column(String, primary_key=True)
    tenant_id = Column(String, ForeignKey("tenants.id"), index=True)  # ✅ Indexed
    status = Column(String, index=True)  # ✅ Indexed
    category = Column(String, index=True)  # ✅ Indexed

2. Use EXPLAIN ANALYZE

**Analyze slow queries:**

# For debugging
result = await db.execute(text("""
    EXPLAIN ANALYZE
    SELECT * FROM agent_registry
    WHERE tenant_id = :tenant_id
      AND status = :status
"""), {"tenant_id": tenant_id, "status": "autonomous"})

print(result.all())

3. Avoid N+1 Queries

**Use eager loading for relationships:**

# ✅ GOOD: Eager loading
from sqlalchemy.orm import joinedload

proposals = await db.execute(
    select(AgentProposal).options(
        joinedload(AgentProposal.agent)
    ).where(AgentProposal.tenant_id == tenant_id)
)

# ❌ BAD: N+1 queries
proposals = await db.execute(
    select(AgentProposal).where(AgentProposal.tenant_id == tenant_id)
)

for proposal in proposals:
    # This triggers a separate query for each proposal!
    agent = proposal.agent

4. Use Pagination

**Always paginate list queries:**

# ✅ GOOD: Paginated
result = await db.execute(
    select(AgentRegistry)
    .where(AgentRegistry.tenant_id == tenant_id)
    .limit(50)
    .offset(0)
)

# ❌ BAD: No pagination (could return millions of rows)
result = await db.execute(
    select(AgentRegistry).where(AgentRegistry.tenant_id == tenant_id)
)

Migration Guide

Before (Mixed Patterns)

# Some use ORM
agent = db.query(AgentRegistry).filter(
    AgentRegistry.id == agent_id
).first()

# Others use raw SQL
result = db.execute(
    "SELECT * FROM agents WHERE id = :id",
    {"id": agent_id}
)
agent = result.fetchone()

# Others use async ORM
result = await db.execute(
    select(AgentRegistry).where(AgentRegistry.id == agent_id)
)
agent = result.scalar_one_or_none()

After (Standardized)

# Always use async ORM for simple queries
agent = await get_agent_by_id(db, agent_id, tenant_id)

# Always use raw SQL for complex queries
agents = await db.execute(text("""
    SELECT ...complex aggregation...
"""), {"tenant_id": tenant_id})

Testing

Test Query Performance

import time

def test_query_performance():
    start = time.time()

    result = await db.execute(
        select(AgentRegistry).where(AgentRegistry.tenant_id == tenant_id)
    )
    agents = result.scalars().all()

    elapsed = time.time() - start

    assert elapsed < 0.1, f"Query too slow: {elapsed}s"
    assert len(agents) <= 100, "Too many results (use pagination)"

Test Tenant Isolation

def test_tenant_isolation():
    # Should not see other tenants' data
    agents = await db.execute(
        select(AgentRegistry).where(AgentRegistry.tenant_id == tenant_id)
    )

    for agent in agents:
        assert agent.tenant_id == tenant_id, "Tenant isolation violated!"

Monitoring

Track Query Performance

  • Log slow queries (>100ms)
  • Monitor query frequency
  • Track query patterns
  • Alert on N+1 queries

Metrics to Track

  • Average query time
  • Query frequency
  • Rows returned per query
  • Index usage

Best Practices Summary

  1. ✅ Always use async/await for database operations
  2. ✅ Always filter by tenant_id
  3. ✅ Use ORM for simple queries (CRUD)
  4. ✅ Use raw SQL for complex queries (aggregations, analytics)
  5. ✅ Use query helpers for common patterns
  6. ✅ Always paginate list queries
  7. ✅ Use eager loading to avoid N+1 queries
  8. ✅ Create indexes for frequently queried columns
  9. ✅ Use EXPLAIN ANALYZE for slow queries
  10. ✅ Handle missing data gracefully

References

  • SQLAlchemy Documentation: https://docs.sqlalchemy.org/
  • PostgreSQL Performance: https://www.postgresql.org/docs/current/performance-tips.html
  • Query Helpers: backend-saas/core/query_helpers.py (to be created)

Changelog

  • 2026-02-08: Initial standard created
  • 2026-02-08: Decision framework documented
  • 2026-02-08: Common patterns defined